| Model Name | Brief Description | Code Syntax |
|---|---|---|
| UMAP | UMAP (Uniform Manifold Approximation and Projection) is used for dimensionality reduction. Pros: High performance, preserves global structure. Cons: Sensitive to parameters. Applications: Data visualization, feature extraction. Key hyperparameters:
|
|
| t-SNE | t-SNE (t-Distributed Stochastic Neighbor Embedding) is a nonlinear dimensionality reduction technique. Pros: Good for visualizing high-dimensional data. Cons: Computationally expensive, prone to overfitting. Applications: Data visualization, anomaly detection. Key hyperparameters:
|
|
| PCA | PCA (principal component analysis) is used for linear dimensionality reduction. Pros: Easy to interpret, reduces noise. Cons: Linear, may lose information in nonlinear data. Applications: Feature extraction, compression. Key hyperparameters:
|
|
| DBSCAN | DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm. Pros: Identifies outliers, does not require the number of clusters. Cons: Difficult with varying density clusters. Applications: Anomaly detection, spatial data clustering. Key hyperparameters:
|
|
| HDBSCAN | HDBSCAN (Hierarchical DBSCAN) improves on DBSCAN by handling varying density clusters. Pros: Better handling of varying densities. Cons: Can be slower than DBSCAN. Applications: Large datasets, complex clustering problems. Key hyperparameters:
|
|
| K-Means clustering | K-Means is a centroid-based clustering algorithm that groups data into k clusters. Pros: Efficient, simple to implement. Cons: Sensitive to initial cluster centroids. Applications: Customer segmentation, pattern recognition. Key hyperparameters:
|
|
| Method | Brief Description | Code Syntax |
|---|---|---|
| make_blobs | Generates isotropic Gaussian blobs for clustering. |
|
| multivariate_normal | Generates samples from a multivariate normal distribution. |
|
| plotly.express.scatter_3d | Creates a 3D scatter plot using Plotly Express. |
|
| geopandas.GeoDataFrame | Creates a GeoDataFrame from a Pandas DataFrame. |
|
| geopandas.to_crs | Transforms the coordinate reference system of a GeoDataFrame. |
|
| contextily.add_basemap | Adds a basemap to a GeoDataFrame plot for context. |
|
| pca.explained_variance_ratio_ | Returns the proportion of variance explained by each principal component. |
|